Abstract
Background:
Karyotyping by chromosome banding analysis (CBA) is standard-of-care in hematological malignancies and important for diagnosis, classification and prognosis. Karyotypes are documented in a standardized manner according to International System for Human Cytogenetic Nomenclature (ISCN). However, this text-based format of cytogenetic data so far largely precludes its use for large-scale computational analysis.
Aims:
1. To develop a computational pipeline to translate ISCN-karyotypes into a computer-interpretable form and use automated feature selection on sparse and high-dimensional data to build a prognostic model based on cytogenetic aberrations and other known prognostic markers.
2. As proof-of-principle, apply the pipeline to a large CLL cohort.
Methods:
Cytogenetic aberrations were determined by CBA for 1,229 CLL patients diagnosed between 2005 and 2024 (median age: 67 years [30-94]; female: 36%). CLL diagnoses were established by cytomorphology and immunophenotyping. IGHV mutation status was determined according to ERIC recommendations and TP53 mutation status by targeted NGS panels or Sanger sequencing. The CytoGPS tool (Abrams et al., Bioinformatics, 24, 2019) was used to convert ISCN karyotypes into binary Loss-Gain-Fusion (LGF) vectors at the resolution of cytogenetic bands. Fusion events encompassed all abnormal juxtapositions of bands, both in translocations and intrachromosomal deletions. Redundant feature filtering reduced the initial 3x916 dimensional output to 1003 unique features. For robust feature prioritization on high-dimensional data, we used a penalized Cox model with elastic net regularization, hyperparameter tuning and stability-enhanced feature selection by 5-fold cross-validation on an 80% training set; the remaining 20% was used for model evaluation. Established prognostic factors in CLL (IGHV status, TP53 status, age, sex) were integrated. Kaplan-Meier analysis was used to analyze overall survival (OS).
Results:
Analysis of LGF vectors across chromosomes revealed the most prevalent losses of cytogenetic bands involved chromosomes 13q (48% of cases), 11q (12%), 17p (5%), and 14q (4%). Gains of bands were most common on chromosome 12 (13% - mostly by trisomy 12), followed by gains on chromosomes 2p (4%) and 8q (3%). Fusion events were most often observed on chromosomes 13q (47%, mostly related to intrachromosomal deletions), 14q (9%, deletions and translocations), and 11q (13%, deletions and translocations). To identify robust prognostic markers for OS, multivariate Cox regression was used on cytogenetic features and established prognostic factors. Cross-validation and stability selection demonstrated high reproducibility of the selected predictors and retrieved 16 features as highly predictive of OS (C-index of 0.79). Among the statistically significant features were loss of 17p10 (HR=3.1, p=0.01) and loss of 11q24 (HR=2.1, p=0.02), related to the well-established risk factors del(17p) and del(11q), respectively. Interestingly, losses within 6q24-26 were also strongly associated with a poor prognosis (HR=3.12, p<0.001). Deletions in 6q have previously been reported as recurrent cytogenetic aberration in CLL, however, their prognostic significance remained controversial. In our cohort, 4% of all cases show loss of bands within 6q24-26. These cases are strongly enriched for unmutated IGHV status (IGHV-U) (11.6-fold, p<0.001). Among IGHV-U cases, the loss of bands within 6q24-26 was associated with inferior survival (median OS of 6.2 vs. 12.5 y, p<0.001). For mutated IGHV, the number of cases (5) was not sufficient to assess effects on OS. In multivariate Cox regression analysis, loss of bands within 6q24-26 had an independent negative prognostic impact (HR=2.1, p<0.001), as had age (HR=1.07, p<0.001), IGHV-M (HR=0.51, p<0.001), TP53 aberrations (HR=1.97, p<0.001) and high-complex karyotype (5+ aberrations, HR=1.62, p=0.015).
Conclusions:
We have developed a computational pipeline to translate ISCN-karyotypes into a computer-interpretable form and then use automated feature selection to build a prognostic model. In CLL, this identified known prognostic cytogenetic markers but also highlighted losses within 6q24-26 as a strong negative prognostic factor, which correlated with unmutated IGHV-status and warrants further investigation. Importantly, this pipeline is readily generalizable to other diseases and enables mining of large cytogenetic databases in ISCN format with ease.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal